15 research outputs found

    Semantic Representations with Attention Networks for Boosting Image Captioning

    Get PDF
    Image captioning has shown encouraging outcomes with Transformer-based architectures that typically use attention-based methods to establish semantic associations between objects in an image for caption prediction. Nevertheless, when appearance features of objects in an image display low interdependence, attention-based methods have difficulty in capturing the semantic association between them. To tackle this problem, additional knowledge beyond the task-specific dataset is often required to create captions that are more precise and meaningful. In this article, a semantic attention network is proposed to incorporate general-purpose knowledge into a transformer attention block model. This design combines visual and semantic properties of internal image knowledge in one place for fusion, serving as a reference point to aid in the learning of alignments between vision and language and to improve visual attention and semantic association. The proposed framework is validated on the Microsoft COCO dataset, and experimental results demonstrate competitive performance against the current state of the art

    AgroSupportAnalytics: A cloud-based complaints management and decision support system for sustainable farming in Egypt

    Get PDF
    Sustainable Farming requires up-to-date advice on crop diseases, patterns, and adequate prevention actions to face developing circumstances. Currently, in developing countries like Egypt, farmers’ access to such information is extremely limited due to the agriculture support being either not available, inconsistent, or unreliable. The presented Cloud-based Complaints Management and Decision Support System for Sustainable Farming in Egypt, named as AgroSupportAnalytics, aims to resolve the problem of both the lack of support and advice for farmers, and the inconsistencies in doing so by current manual approach provided by agricultural experts. Key contribution is the development of an automated complaint management and decision support strategy, on the basis of extensive research on requirement analysis tailored for Egypt. The solution is grounded on the application of knowledge discovery and analysis on agricultural data and farmers’ complaints, deployed on a Cloud platform, to provide farming stakeholders in Egypt with timely and suitable support. This paper presents the overall system architectural framework along with the information and storage services, which have been based on the requirements specifications phases of the project along with the historical data sets of past 10 year of farmers complaints and enquiries in Egypt

    Application of region-based video surveillance in smart cities using deep learning

    Get PDF
    Smart video surveillance helps to build more robust smart city environment. The varied angle cameras act as smart sensors and collect visual data from smart city environment and transmit it for further visual analysis. The transmitted visual data is required to be in high quality for efficient analysis which is a challenging task while transmitting videos on low capacity bandwidth communication channels. In latest smart surveillance cameras, high quality of video transmission is maintained through various video encoding techniques such as high efficiency video coding. However, these video coding techniques still provide limited capabilities and the demand of high-quality based encoding for salient regions such as pedestrians, vehicles, cyclist/motorcyclist and road in video surveillance systems is still not met. This work is a contribution towards building an efficient salient region-based surveillance framework for smart cities. The proposed framework integrates a deep learning-based video surveillance technique that extracts salient regions from a video frame without information loss, and then encodes it in reduced size. We have applied this approach in diverse case studies environments of smart city to test the applicability of the framework. The successful result in terms of bitrate 56.92%, peak signal to noise ratio 5.35 bd and SR based segmentation accuracy of 92% and 96% for two different benchmark datasets is the outcome of proposed work. Consequently, the generation of less computational region-based video data makes it adaptable to improve surveillance solution in Smart Cities

    Uncertainty assisted robust tuberculosis identification with Bayesian convolutional neural networks

    Get PDF
    Tuberculosis (TB) is an infectious disease that can lead towards death if left untreated. TB detection involves extraction of complex TB manifestation features such as lung cavity, air space consolidation, endobronchial spread, and pleural effusions from chest x-rays (CXRs). Deep learning based approach named convolutional neural network (CNN) has the ability to learn complex features from CXR images. The main problem is that CNN does not consider uncertainty to classify CXRs using softmax layer. It lacks in presenting the true probability of CXRs by differentiating confusing cases during TB detection. This paper presents the solution for TB identification by using Bayesian-based convolutional neural network (B-CNN). It deals with the uncertain cases that have low discernibility among the TB and non-TB manifested CXRs. The proposed TB identification methodology based on B-CNN is evaluated on two TB benchmark datasets, i.e., Montgomery and Shenzhen. For training and testing of proposed scheme we have utilized Google Colab platform which provides NVidia Tesla K80 with 12 GB of VRAM, single core of 2.3 GHz Xeon Processor, 12 GB RAM and 320 GB of disk. B-CNN achieves 96.42% and 86.46% accuracy on both dataset, respectively as compared to the state-of-the-art machine learning and CNN approaches. Moreover, B-CNN validates its results by filtering the CXRs as confusion cases where the variance of B-CNN predicted outputs is more than a certain threshold. Results prove the supremacy of B-CNN for the identification of TB and non-TB sample CXRs as compared to counterparts in terms of accuracy, variance in the predicted probabilities and model uncertainty

    COMPARISON OF SUSTAINED PRESSURE VS ISCHEMIC COMPRESSION ON TRIGGER POINTS IN CHRONIC MYOFACIAL PAIN MANAGEMENT

    Get PDF
    ABSTRACT OBJECTIVE: To determine the effect of different trigger points approaches in improving chronic myofascial pain. METHODS: This randomized controlled trial was conducted in Railway General Hospital, Rawalpindi, Pakistan from July-December 2016. Patients were randomly divided into two treatment groups through lottery method, in which 37 male participants who full filled the inclusion criteria (persistent pain >6 months, gradual onset of pain and impaired level of activity) were randomly allocated to sustained pressure (Group A) and ischemic compression (Group B) treated groups. Both groups received eight treatments sessions. They were evaluated at baseline and after 8th visit through Numeric Pain Rating Scale (NPRS) and Chronic Pain Acceptance Questionnaire (CPAQ). RESULTS: Within the group-A the pre and post-treatment mean for NPRS were 5.05±1.17 and 2.63±0.955 (p <0.001). Pre and post-treatment CPAQ activity engagement values were 32.00±2.42 and 41.74±2.53 (p <0.001). Pre and post-treatment CPAQ pain willingness values were 29.42±3.04 and 32.63±2.91 (p <0.001). Pre and post-treatment CPAQ sum was 61.42±3.67 and 73.84±3.64 (p 0.05). Pre and post treatment values for CPAQ sum were 64.61±2.42 and 75.72±1.12 (p<0.001). CONCLUSION: Improvement in pain relief was observed in both groups but there was no significant improvement in pain relief between ischemic compression and sustained pressure groups

    Towards counterfactual and contrastive explainability and transparency of DCNN image classifiers

    No full text
    Explainability of deep convolutional neural networks (DCNNs) is an important research topic that tries to uncover the reasons behind a DCNN model’s decisions and improve their understanding and reliability in high-risk environments. In this regard, we propose a novel method for generating interpretable counterfactual and contrastive explanations for DCNN models. The proposed method is model intrusive that probes the internal workings of a DCNN instead of altering the input image to generate explanations. Given an input image, we provide contrastive explanations by identifying the most important filters in the DCNN representing features and concepts that separate the model’s decision between classifying the image to the original inferred class or some other specified alter class. On the other hand, we provide counterfactual explanations by specifying the minimal changes necessary in such filters so that a contrastive output is obtained. Using these identified filters and concepts, our method can provide contrastive and counterfactual reasons behind a model’s decisions and makes the model more transparent. One of the interesting applications of this method is misclassification analysis, where we compare the identified concepts from a particular input image and compare them with class-specific concepts to establish the validity of the model’s decisions. The proposed method is compared with state-of-the-art and evaluated on the Caltech-UCSD Birds (CUB) 2011 dataset to show the usefulness of the explanations provided

    Fingerprint frequency normalisation and enhancement using two‐dimensional short‐time Fourier transform analysis

    No full text
    A fingerprint image with non‐uniform ridge frequencies can be considered as a two‐dimensional dynamic signal. A non‐uniform stress on the sensing area applied during fingerprint acquisition may result in a non‐linear distortion that disturbs the local frequency of ridges adversely affecting the matching performance. This study presents a new approach based on Short time Fourier transform analysis and local adaptive contextual filtering for frequency distortion removal and enhancement. In the proposed approach, the fingerprint image is divided into sub‐images to determine local dominant frequency and orientation. Gaussian Directional band pass filtering is then adaptively applied in frequency domain. These filtered sub‐images are then combined in spatial domain using a novel technique to obtain the enhanced fingerprint image of high ridge quality and uniform inter‐ridge distance. Simulation results show the efficacy of the proposed enhancement technique as compared to other well‐known contextual filtering based enhancement techniques reported in the literature

    Convolutional Neural Networks Based Time-Frequency Image Enhancement For the Analysis of EEG Signals

    No full text
    Quadratic time-frequency (TF) methods are commonly used for the analysis, modeling, and classification of time-varying non-stationary electroencephalogram (EEG) signals. Commonly employed TF methods suffer from an inherent tradeoff between cross-term suppression and preservation of auto-terms. In this paper, we propose a new convolutional neural network (CNN) based approach to enhancing TF images. The proposed method trains a CNN using the Wigner-Ville distribution as the input image and the ideal time-frequency distribution with the total concentration of signal energy along the IF curves as the output image. The results show significant improvement compared to the other state-of-the-art TF enhancement methods. The codes for reproducing the results can be accessed on the GitHub via https://github.com/nabeelalikhan1/CNN-based-TF-image-enhancement

    Computationally Efficient Light Field Image Compression Using a Multiview HEVC Framework

    No full text
    The acquisition of the spatial and angular information of a scene using light eld (LF) technologies supplement a wide range of post-processing applications, such as scene reconstruction, refocusing, virtual view synthesis, and so forth. The additional angular information possessed by LF data increases the size of the overall data captured while offering the same spatial resolution. The main contributor to the size of captured data (i.e., angular information) contains a high correlation that is exploited by state-of-the-art video encoders by treating the LF as a pseudo video sequence (PVS). The interpretation of LF as a single PVS restricts the encoding scheme to only utilize a single-dimensional angular correlation present in the LF data. In this paper, we present an LF compression framework that efciently exploits the spatial and angular correlation using a multiview extension of high-efciency video coding (MV-HEVC). The input LF views are converted into multiple PVSs and are organized hierarchically. The rate-allocation scheme takes into account the assigned organization of frames and distributes quality/bits among them accordingly. Subsequently, the reference picture selection scheme prioritizes the reference frames based on the assigned quality. The proposed compression scheme is evaluated by following the common test conditions set by JPEG Pleno. The proposed scheme performs 0.75 dB better compared to state-of-the-art compression schemes and 2.5 dB better compared to the x265-based JPEG Pleno anchor scheme. Moreover, an optimized motionsearch scheme is proposed in the framework that reduces the computational complexity (in terms of the sum of absolute difference [SAD] computations) of motion estimation by up to 87% with a negligible loss in visual quality (approximately 0.05 dB)

    Face recognition with Bayesian convolutional networks for robust surveillance systems

    No full text
    Abstract Recognition of facial images is one of the most challenging research issues in surveillance systems due to different problems including varying pose, expression, illumination, and resolution. The robustness of recognition method strongly relies on the strength of extracted features and the ability to deal with low-quality face images. The proficiency to learn robust features from raw face images makes deep convolutional neural networks (DCNNs) attractive for face recognition. The DCNNs use softmax for quantifying model confidence of a class for an input face image to make a prediction. However, the softmax probabilities are not a true representation of model confidence and often misleading in feature space that may not be represented with available training examples. The primary goal of this paper is to improve the efficacy of face recognition systems by dealing with false positives through employing model uncertainty. Results of experimentations on open-source datasets show that 3–4% of accuracy is improved with model uncertainty over the DCNNs and conventional machine learning techniques
    corecore